Learning to Cooperate via Policy Search

نویسندگان

  • Leonid Peshkin
  • Kee-Eung Kim
  • Nicolas Meuleau
  • Leslie Pack Kaelbling
چکیده

Cooperative games are those in which both agents share the same payoff structure. Valuebased reinforcement-learning algorithms, such as variants of Q-learning, have been applied to learning cooperative games, but they only apply when the game state is completely observable to both agents. Policy search methods are a reasonable alternative to value-based methods for partially observable environments. In this paper, we provide a gradient-based distributed policysearch method for cooperative games and compare the notion of local optimum to that of Nash equilibrium. We demonstrate the effectiveness of this method experimentally in a small, partially observable simulated soccer domain.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Guided Policy Search via Approximate Mirror Descent

Guided policy search algorithms can be used to optimize complex nonlinear policies, such as deep neural networks, without directly computing policy gradients in the high-dimensional parameter space. Instead, these methods use supervised learning to train the policy to mimic a “teacher” algorithm, such as a trajectory optimizer or a trajectory-centric reinforcement learning method. Guided policy...

متن کامل

Transfer Learning for Policy Search Methods

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies...

متن کامل

Transfer Learning for Policy Search Methods

An ambitious goal of transfer learning is to learn a task faster after training on a different, but related, task. In this paper we extend a previously successful temporal difference (Sutton & Barto, 1998) approach to transfer in reinforcement learning (Sutton & Barto, 1998) tasks to work with policy search. In particular, we show how to construct a mapping to translate a population of policies...

متن کامل

Sequential Classification-Based Optimization for Direct Policy Search

Direct policy search often results in high-quality policies in complex reinforcement learning problems, which employs some optimization algorithms to search the parameters of the policy for maximizing the its total reward. Classificationbased optimization is a recently developed framework for derivative-free optimization, which has shown to be effective and efficient for non-convex optimization...

متن کامل

Variational Policy Search via Trajectory Optimization

In order to learn effective control policies for dynamical systems, policy search methods must be able to discover successful executions of the desired task. While random exploration can work well in simple domains, complex and highdimensional tasks present a serious challenge, particularly when combined with high-dimensional policies that make parameter-space exploration infeasible. We present...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000